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Human mobility and activity patterns mediate contagion on many levels, including the 
spatial spread of infectious diseases, diffusion of rumors, and emergence of consensus. 
These patterns however are often dominated by specific locations and recurrent flows and 
poorly modeled by the random diffusive dynamics generally used to study them. Here 
we develop a theoretical framework to analyze contagion within a network of locations 
where individuals recall their geographic origins. We find a phase transition between a 
regime in which the contagion affects a large fraction of the system and one in which only 
a small fraction is affected. This transition cannot be uncovered by continuous determin- 
istic models due to the stochastic features of the contagion process and defines an invasion 
threshold that depends on mobility parameters, providing guidance for controlling con- 
tagion spread by constraining mobility processes. We recover the threshold behavior by 
analyzing diffusion processes mediated by real human commuting data. 
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In recent years, reaction-diffusion processes have been used as a successful modeling frame- 
work to approach a wide array of systems that along with the usual chemical and physical 
phenomena^ includes epidemic spread^ 5 * 6 * 7 * 8 * 9 *, human mobility^MOSI, information, and social 
contagion processes* 10 * 11 * 12 * 13 * 1 -^! This has stimulated the broadening of reaction-diffusion mod- 
els in order to deal with complex network substrates and complex mobility schemes^ 6 * 17 * 18 * 19 * 20 ! 
This success has allowed for the theoretical characterization of new and interesting dynamical 
behaviors and provide a rationale for the understanding of the emerging critical points that un- 
derpin some of the most interesting characteristics of techno-social systems. Those studies how- 
ever are all focused on mobility processes modeled through simple memoryless diffusive pro- 
cesses. The recent accumulation of large amounts of data on human mobility ^ 1 * 22 * 23 * 24 * 25 ^ from 
the scale of single individuals to the scale of entire populations presents us with new challenges 
related to the high level of predictability and recurrenc e * 27 * 28 * 29 * found in mobility and diffusion 
patterns from real data. For instance, commuting mobility denoted by recurrent bidirectional 
flows among locations dominates by an order of magnitude the human mobility network at the 
scale of census areas defined by major urban areas 30 . The effect of highly -predictable or recur- 
rent features of particles/agents mobility in the large-scale behavior of contagion processes how- 
ever cannot be studied by a simple adaptation of previous theoretical frameworks * 3 1 l 32 l 33 l 34 l 35 l 36 l 
and call for specific methodologies and approximations capable of coping with non-markovian 
diffusive processes in complex networks. 
Modeling commuting networks. 

In order to start investigating the effect of regular mobility patterns in reaction-diffusion systems 
we have considered the prototypical example of the spread of biological agents and information 
processes in populations characterized by bidirectional commuting patterns. In this case we 
consider a system made of V distinct subpopulations. The V subpopulations form a network 
in which each subpopulation i has a population made of Ni individuals and is connected to a 
set of other subpopulations v(i). The edge connecting two subpopulations i and j indicates 
the presence of a flux of commuters. We assume that individuals in the subpopulation i will 
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visit anyone of the connected subpopulations with a per capita diffusion rate <7j. As we aim at 
modeling commuting processes in which individuals have a memory of their location of origin, 
displaced individuals return to their original subpopulation with rate r _1 . 

Real data from commuting networks add an extra layer of complexity to the problem. In 
Fig. 1 we display the cumulative distributions of the number of commuting connections per 
administrative unit and the daily flux of commuters on each connection in the United States and 
France. The networks exhibit important variability in the number of connections per geographic 
area. Analogously, the daily number of commuters on each connection is highly heterogeneous, 
distributed in a wide range of four to six orders of magnitude. These properties, often mathe- 
matically encoded in a heavy-tailed probability distribution, have been shown to have important 
consequences for dynamical processes, altering the threshold behavior and the associated dy- 
namical phase transition* 31 * 32 * 33 * 37 * 38 * 39 * 40 *. In order to take into account the effect of the network 
topology we use a particle-network framework in which we consider a random subpopulation 
network with given degree distribution P(k) and denote the number of subpopulations with k 
connections by Furthermore, we assume statistical equivalence for subpopulations of sim- 
ilar degree. This is a mean-field approximation that considers all subpopulations with a given 
degree k as statistically equivalent, thus allowing the introduction of degree-block variables that 
depend only upon the subpopulation degree*^. While this is an obvious approximation of the 
system description, it has been successfully applied to many dynamical processes on complex 
networks and it is rooted in the empirical evidence gathered in previous works^^ 2 * 23 * 33 l For 
the sake of analysis we will assume that the average population in each node of degree k fol- 
lows the functional form Nk = Nk/(k) where N = J2k NkP(k) is the average number of 
individuals per node in the subpopulation network. This expression represents the stationary 
population distribution in the case of a simple random diffusive process in which the diffusion 
rate of individuals along each link leaving a node of degree k has the form 1 / /cESED Moreover, 
the empirical data from various sources suggest similar population scaling arise as a function of 
their connectivity to other populations* 19 * 22 * 23 *. 
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In order to approach the spreading process in the subpopulation network analytically, we 
define mixing subpopulations 68 that identify the number of individuals N kk /(t) of the subpop- 
ulation k present in subpopulation k' at time t (see Fig. 2). We consider that the diffusion rate 
a kk i is a function of the degree k and k' of the origin and destination subpopulations, respec- 
tively, with cr fc = J2k'ev(k) a kk' and r k depending only on the degree of the origin subpopulation. 
In particular, if cr k <^ r^ 1 and we study the system on a time scale larger than the time scale 
of the commuting process r k one can consider a quasi- stationary approximation in which the 
mixed subpopulations assume their stationary values: 

Nk 

Nkk ~ 77\7i , \ ' 

(k)(l + a k T k ) 

Nka kk ,T k 

+ O k T k ) 

These expressions (see the Methods section) allow us to consider the subpopulation k as if 
it had an effective number of individuals Nkk' "C Nkk m contact with the individuals of the 
neighboring subpopulation k' in a quasi- stationary state reached whenever the time scale of the 
dynamical process we are studying is larger than r k . For the sake of the analytical treatment 
in the following we will consider in the commuting rates only the dependence on the degree 
classes. More complicated functional forms including explicitly the spatial distance may be 
considered and we will analyze this case by performing data-driven simulations. 
Contagion processes and the invasion threshold. 

In analyzing contagion processes in this system we consider the usual susceptible-infected- 
recovered (SIR) contagion model 41 . Within each subpopulation the total number of individuals 
is partitioned into the compartments S(t), I(t) and R(t), denoting the number of susceptible, 
infected, and removed individuals at time t, respectively. The basic SIR rules thus define a 
reaction scheme of the type S + / — > 21 with reaction rate (3 and I —¥ R with reaction rate /i, 
which represent the contagion and recovery processes, respectively. The SIR epidemic model 
conserves the number of individuals and is characterized by the reproductive number R = 
P/fji that determines the average number of infectious individuals generated by one infected 
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individual in a fully-susceptible population. The epidemic is able to generate a number of 
infected individuals larger than those who recover only if R > 1, yielding the classic result 
for the epidemic threshold 41 ; if the spreading rate is not large enough to allow a reproductive 
number larger than one (i.e., (3 > fi), the epidemic outbreak will affect only a negligible portion 
of the population and will die out in a finite amount of time. 

While this result is valid at the level of each subpopulation, each subpopulation may or may 
not transmit the infection or contagion process to another subpopulation it is in contact with, 
depending on the level of mixing among the subpopulations. In other words, the mobility pa- 
rameters (Tfc and Tfc influence the probability that individuals carrying infection or information 
will export the contagion process to nearby subpopulations. If the diffusion rate approaches 
zero the probability of the contagion entering neighboring subpopulations goes to zero as there 
are no occasions for the carriers of the process to visit them. On the other hand if the return rate 
is very high, then the visiting time of individuals in neighboring populations is so short that they 
do not have time to spread the contagion in the visited subpopulations. This implies the presence 
of a transition™ 4 ^ 43 ^ between a regime in which the contagion process may invade a macro- 
scopic fraction of the network and a regime in which it is limited to a few subpopulations (see 
Fig. 2 for a pictorial illustration). In this perspective we can consider the subpopulation network 
in a coarse-grained view and provide a characterization of the invasion dynamics at the level 
of subpopulations, translating epidemiological and demographic parameters into Levins-type 
parameters of extinction and invasion rates. Let us define D® as the number of subpopulations 
of degree k affected by the contagion at generation 0, i.e., those which are experiencing the out- 
break at the beginning of the process. Each subpopulation invaded by the contagion process will 
seed - during the course of the outbreak - the contagion process in neighboring subpopulations, 
defining the set D\ of invaded subpopulations at generation 1, and so on. This corresponds 
to a basic branching process ^ * 33 ! 42 * 45 ! 46 * where the nth generation of infected subpopulations of 
degree k is denoted by D%. In order to describe the early stage of the subpopulation invasion 
dynamics we assume that the number of subpopulations affected by a contagion outbreak (with 
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Ro > 1) is small and we can therefore study the evolution of the number of subpopulations af- 
fected by the contagion process by using a tree-like approximation relating D% with D^ 1 . As 
it is shown in the Methods section, in the case of R ~ 1, it is possible to derive the following 
recursive equation 

Dl = {Ro-l) k -^Y.D n k 7\k'-l)\ k , k . (3) 

This relation carries explicit dependence on the network topology through the degree distribu- 
tion P(k) and the factor \ k 'k that is the number of contagious seeds that are introduced into a 
fully-susceptible population of degree k from a neighboring population of degree k! . If the time 
scale of the disease is considerably larger than the commuting time scale, that is in our case 
pT 1 3> r, we can consider the infectious individuals in the mixing subpopulation to assume 
their stationary values according to Eq. Q. The quantity \ k 'k can therefore be expressed as the 
total number of infected individuals in the mixing subpopulation by X^k = (Nk'k + N^w) 
where a is the fraction of individuals that are affected by the contagion by the end of the SIR 
epidemic. The first term in the right-handside of the above expression accounts for the total 
visits of infectious people from source subpopulation k! to target subpopulation k. While the 
second term counts for the visits of individuals from the target subpopulation to the source sub- 
population, during which they acquire infection and carry the contagion back to their origin. If 
we use the steady state expression in equation Q, and we consider that a for the SIR dynam- 
ics can be explicitly written for Ro ~ 1 it is possible to write an explicit form of the iterative 
equation ([3]), whose dynamical behavior is determined by the branching ratio 

fl - ^g(i^/Xp) F(W - (fc2> - (t3) - (t4)> ' (4) 

where p = or is the ratio of commuting to return rate and for the sake of simplicity we have 
considered that the per capita commuting rate o and return rate r _1 are the same for all subpop- 
ulations. In the above expression F is a function only of the moments of the degree distribution 
of the subpopulation network. R* is therefore equivalent to a basic reproductive number at 
the subpopulation level, defining the average number of supopulations to which each infected 
subpopulation will spread the contagion process. R* thus defines the invasion threshold as 
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any contagion process will spread globally in the network system only if i?* > 1. The sub- 
population branching process is inherently considering the stochastic effects of the epidemic 
dynamics in the probability of contagion from one subpopulation to the other. It is interesting 
to note that the invasion threshold cannot indeed be derived in continuous deterministic models 
where stochastic effects are neglected. 
Phase diagram and the network structure. 

For fixed disease and network parameters, the condition = 1 of Eq. (|4]) defines critical value 
for p that allows for the spreading of the contagion process. Thus there are two parameters 
underlying the mobility dynamics that we can either hold fixed or let free. In Fig. 3 we show 
the phase diagram in the a-r space separating the global invasion from the extinction regime. 
The phase diagram tells us that, all parameters being equal, the rate of diffusion to nearby sub- 
populations has to be larger than a c to guarantee the spreading of the contagion. Analogously, 
if we allow r to vary, we observe that the global spreading of the contagion process can be 
achieved by extending the visit times r of individuals in nearby subpopulations above a definite 
threshold t c . The explicit expressions of the threshold values can be found in the Supplementary 
Information. 

Another very interesting feature of the above threshold condition is the explicit effect of the 
network topology encoded in the moments of the degree distribution. Indeed, the heterogeneity 
of the network favors the global spread of the contagion process by lowering the threshold value. 
In the Supplementary Information we show that in the case of heavy-tailed degree distribution 
the threshold virtually reduces to zero for infinitely large system sizes. Even at finite size, 
however, the threshold value is generally smaller for networks with greater heterogeneity as 
is shown in Fig. 3, which compares the phase diagrams of heterogeneous and homogeneous 
networks of same size. In order to test the validity of the analytical picture obtained here, 
we have performed an extensive set of Monte Carlo numerical simulations of the contagion 
process in large subpopulation networks. The simulations are individual based and consider the 
commuting and contagion dynamics microscopically with no approximations as detailed in the 
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Supplementary Information. The substrate network is given by an uncorrelated random complex 
network^ generated with the uncorrelated configuration modeP^ to avoid inherent structural 
correlations. In Fig. 3 we report the results for a network with Poissonian degree distribution 
and a network with power-law degree distribution P{k) ~ k~ 2A . Individuals are distributed 
heterogeneously in each subpopulation according to the relation Nk = Nk/(k), where iV = 
10 4 . Although the analytical phase diagram has been derived by using several approximations, 
it matches the numerical simulations qualitatively and quantitatively, as shown by the good 
agreement of the analytical phase boundary and the numerical simulations in Fig. 3b. We also 
report in Fig. 4a the behavior of the number of invaded populations as a function of commuting 
rates. The phase transition between the invasion and extinction regimes at a specific value of 
p = err is clearly observed in the microscopic simulations. 
Data-driven simulations. 

As a further confirmation of the validity of the theoretical results we have tested our results 
in a real-world setting. We have considered the commuting network of all counties in the 
continental US as obtained by the US Census 2000 data™ In this dataset each subpopulation 
represents a county and a connection the presence of commuting flow between two counties. In 
the simulation each county is associated with its actual population and each link with a specific 
commuting rate from the real data. We have considered only short-range commuting flows up 
to 125 miles. The visit time has been considered to be of the order of a working day (8 hours). 
On this real data layer we have simulated the spreading of an SIR contagion process and studied 
the number of infected counties as a function of the global rescaling factor of the commuting 
rates. It is remarkable to observe that in the case of the real data a clear phase transition exists 
between the two regimes at a critical value of the global rescaling factor of the commuting rates. 
In Fig. 4 we also illustrate the different behavior of the contagion process in the two regimes 
by mapping the number of infected counties in the US as a function of time. 
Conclusions. 

While the presented results are anchored upon the example of disease spread, the metapopula- 
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tion approach can be abstracted to the phenomena of knowledge diffusion, online community 
formation, information spread, and technology. In all these examples, we have individuals 
stationed primarily in well-defined subpopulations, with occasional interactions with other sub- 
populations governed by interaction rates similar in scheme to those presented here. While 
most of the studies in defining epidemic threshold have focused on single populations, it is 
clear that more attention must be devoted to the study of the spread in structured populations. 
In this case the understanding of the invasion threshold is crucial to the analysis of large-scale 
spreading across communities and subpopulations. The theoretical approach presented in this 
paper opens the path to the inclusion of more complicated mobility or interaction scheme and 
at the same time provides a general framework that may be used not just as an interpretative 
framework but a quantitative and predictive framework as well. Understanding the effect of mo- 
bility and interaction patterns on the global spread of contagion processes can indeed be used 
to devise enhanced or suppressed spread by acting on the basic parameters of the system in the 
appropriate way, which might find applications ranging from the protection against emerging 
infectious diseases to viral marketing. 
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Methods 

Stationary populations. Rate equations characterizing the commuting dynamics among subpopulations 
can be defined by using the variables N kk (t) and N kk * (t) as 

d t N kk (t) = -a k N kk (t) + T~ l kY,N kk ,(t)P(k'\k) , (5) 

k> 

d t N w {t) = a kkl N kk {t)-T- l N kk ,(t) , (6) 

where a kk i is the rate at which an individual of subpopulation k commutes to neighboring subpopulation 
k'. Then, considering the statistical equivalence of subpopulations with the same degree and the mean 
field assumption we have a k = k J2 k > & kk iP(k'\k) where P(k'\k) is the conditional probability of having 
a subpopulation k' in the neighborhood of a subpopulation k. Equilibrium is given by the condition 
dtN kk = dtN kk i = and yields the relation 

N kk > = N kk a kk 'T . (7) 

Using the expression N k = N kk (t) + k Y^ k > N kk i{t)P{k'\k) for total number of individuals of subpopu- 
lation k one can obtain the stationary populations in equations ([T} and (|2]). 

Branching process. Each subpopulation of degree k' invaded by the contagion process at the n — 
1th generation may seed its k' — 1 neighbors at most (all of its neighbors minus the one from which 
it got the infection). The probability of finding a subpopulation of degree k in the neighborhood is 
P(k\k'). For each neighboring subpopulation, the probability that it has not already been invaded by the 
contagion process in an earlier generation is nm=o(l — D^/Vk). If \ k / k infectious seeds are sent to the 
neighbor, the outbreak occurs with probability 1 - R Q fe,fe ESI. We can then relate the number of diseased 
subpopulations at the nth generation with that at the n — 1th generation as the simultaneous realization 
of all these above conditions, 

D% = Ek>D n k rHk'-i) 

In the early stage of the contagion process we can assume that flm'Lol-'- ~~ ^fcV^) — 1- ^ e wn ^ a ^ so 
consider the case that we are just above the local epidemic threshold, Rq — 1 <C 1, so that the outbreak 
probability can be approximated by 1 — R Xk ' k ~ (Ro — l)X k / k . If we also ignore degree correlations 
between neighboring subpopulations, P(k\k') = kP{k)/ (k) 40 , we obtain equation 
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Invasion threshold. In order to obtain the explicit expression for the subpopulation reproductive number 
in equation Q we need to derive an expression for Xyj. = (N^'k + ^kk') ol. This expression depends 
on the form of commuting rates among subpopulations. We consider the case in which 

° hk ' a N k + N^ n ' 

where N£ n = kJ2k' ^k'P(k'\k) is the average total population in the neighborhood of subpopulation k. 
The above expression assumes that the per-capita mobility rate is rescaled by the number of individuals 
in the subpopulation 8 , thus leading to a k that decreases as Nk increases. This behavior account for 
the effect introduced by large subpopulation sizes; the overall per capita commuting rate outside of the 
subpopulation generally decreases in large populations as individuals tend to commute internally. In this 
case we obtain 

(k)k' 

aW = ° ((k) + (k 2 ))k ■ (10) 
This expression allows the calculation of N^/ and using the approximate relation for the fraction of 
infected cases generated by the end of the SIR epidemic 41 introduced into a fully susceptible population 
a ~ 2(Rq — 1)/Rq, we obtain the expression for Xk'k'- 

2N(R - l)p 
Ak ' k Rt(k*)(l + (k)/(ki)+p) 

If we substitute the above relation into equation ([3]) we get 



(k' + k) . (11) 



D * = Jw/Wi^ ' 1 ^ - 1)(k+LJ) ■ (12) 

In order to write a closed form of the above iterative process we introduce the definitions 0q = J2k(k ~ 
l)Dfr and 0" = J2kk(k — 1)D% whose next generation equations are defined as 



Q n = GG n l with n 

where G is a 2 x 2 matrix, 



^ 0? 



(13) 



q 2iV( J R - l) 2 p 



Rl(k*)(k)(l + (k)/(k2)+p) 



' (k 3 ) - (k 2 ) (k 2 ) - (k) 
K (k A ) - (k 3 ) (k 3 ) - (k 2 ) 



(14) 
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The global behavior of the contagion process across the network of subpopulations is determined by the 
largest eigenvalue i?* of G as expressed in equation Q where F is a function of the moments of degree 
distribution, 



F((k), (k 2 ), (k 3 ), (k*)) ee jf^-Aik 3 ) - (k 2 ) + (<fc 4 > - (k 3 ))^ ((k 2 ) - (k)) 



(k){k 2 ) 



1/2 



1/2' 



(15) 
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Figure 1 : Statistical properties of commuting networks in the United States and France, a, Com- 
muting network in the United States at the level of counties (http://www.census.gov/). b, Commuting 
network in France at the level of municipalities (http://www.insee.fr/). Cumulative distributions of the 
number of connections (left) and the number of daily commuters (center) per administrative unit, as well 
as the number of daily commuters on each connection (right) are displayed. The networks are highly 
heterogeneous in the number of connections as well as in the commuting fluxes. 
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Figure 2: Illustration of the subpopulation invasion dynamics, a, Mixing of two subpopulations 
and contagion dynamics due to commuting at the microscopic level. At any time subpopulation i is 
occupied by a fraction of its own population Na and a fraction of individuals Nji whose origin is in 
neighboring subpopulation j. The figure depicts the flux of individuals back and forth between the two 
subpopulations due to commuting process. This exchange of individuals is the origin of the transmission 
of the contagion process from subpopulation % to subpopulation j. The contagion process is mediated by 
contacts between infectious (red particles) and susceptible (yellow particles) individuals, b, Macroscopic 
representation of invasion dynamics. Nodes are organized from left to right according to their generation 
index n. Arrows indicates the transmission of the contagion process from a diseased subpopulation at 
the n — 1th generation to a subpopulation at the nth generation. 
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Figure 3: Phase diagrams separating the global invasion regime from the extinction regime, a, Plot 
of the equation Q in the cr-r space. The red and black lines identify the = 1 relation for the homo- 
geneous and heterogeneous uncorrected random networks, respectively. The global spreading regime is 
in the region of parameters indicated by shaded areas. The networks are made of V = 10 4 subpopula- 
tions, each of which accommodates a degree dependent population of = Nk/(k) individuals, with 
N = 10 4 . Both networks have the same average degree in which the heterogeneous network has degree 
distribution P(k) ~ k~ 2A and the homogeneous network has Poisonian degree distribution. The SIR dy- 
namics is characterized by Rq = 1.25 and p^ 1 = 15 days, b, Numerical simulations on heterogeneous 
networks. The system assumes the same parameter values of (a). Color scale from black to yellow is 
linearly proportional to the number of infected subpopulations. Black indicates an invasion of less than 
0.1% of subpopulations and yellow indicates an invasion of more than 10% of subpopulations. 
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Figure 4: Dynamical behavior of an SIR epidemic on the real US commuting network data, a, 

Average fraction of infected subpopulation as a function of commuting rates in networks with the same 
statistical properties as the heterogeneous network in Fig. 3a. Visit time in this case is fixed at r = 1 day. 
b, Average fraction of infected subpopulations as a function of the intensity of commuting fluxes in the 
US. We study the system behavior by varying all commuting rates aij between county pairs by a factor 
co as — > ujcjij. Visit time assumes a realistic value of r = 8 hours. The infection is initially 
seeded in Los Angeles. The data considers only real commuting flows up to 125 miles and the actual 
county populations (see text), c, Temporal progression of average cumulative number of infected cases 
in the subcritical and supercritical regimes of the invasion dynamics. The rescaling factors used in these 
simulations are marked in (b). The SIR dynamics assumes Rq = 1.25 and fi~ l = 3.6 days in both 
cases. 
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Supplementary Information 
Invasion Threshold 

The global behavior of the contagion process is determined by the largest eigenvalue i?* of the subpop- 
ulation next generation matrix G as detailed in the Methods section of main paper. If the eigenvalue 
i?* > 1 we have that the subpopulation invasion process is supercritical and the disease will be able to 
globally spread across subpopulations. This is equivalent to define a subpopulation reproductive number 
-Rj 32|33|42|43|44 | that in structured metapopulation systems is equivalent to basic reproductive number Rq 
at the single population level: 

R *= WcT^T^ ^ (W,(fc 2 ),(fc 3 ),(fc 4 )) , (16) 
Rq( 1 + ( k )/( k ) + P) 



where 



F( 



«*>, {k 2 ), (k 3 ), <fc 4 » = ^ [(k 3 ) - (k 2 ) + ((A; 4 ) - <fc 3 » 1/2 ((k 2 ) - (k)) L/2 \ . (17) 

The infectious diseases will spread globally in the metapopulation system only if R* > 1. Thus, by 
setting R* = 1, we can define an epidemic threshold relation for the mobility ratio p, 

D = i+mm (m 

Pc 2Ar(l- J R 1 )2F((fc),(A;2),(A ; 3),(^))-l ' 

below which the infection remains confined to a small number of subpopulations. In an infinite metapop- 
ulation system the threshold is defined rigorously and the fraction of infected subpopulations is zero 
below the threshold and finite only if the mobility parameters set the ratio p above the threshold value. 
The threshold value is defined for the ratio between the rates characterizing the mobility process. This 
condition is therefore twofold on the mobility dynamics if we fix one of the two parameters a and r, and 
let the other parameter free. On one hand the threshold relation is a c = p c t , 

= (l + (k)/(k 2 ))T-l 

c 2N(1- R^) 2 F((k),(k 2 ),(k^,(k^) - I 
This intuitively states that the rates of diffusion has to be large enough (a > a c ) to guarantee the spread- 
ing of the disease. Interestingly, however, we can also define the threshold relation for r by r c = p c a~ l , 

(l + (k)/(k 2 ))a^ 



2N(1- R^) 2 F((k),(k 2 ),(k 3 ),(k^)) 
21 



(20) 



Nature Physics (2011) | doi:10.1038/nphys1944 Balcan & Vespignani 



that is telling that the global spreading of the contagion porcess can be achieved by reducing the return 
rates of individuals; in other words by extending the visit times of individuals in nearby subpopulations 
(r > r c ). This last conditions however breaks down when r becomes much larger than the contagion 
time scale thus breaking the time-scale separation 8 assumption used here. 

Another very interesting feature of the above threshold condition is the explicit effect of the network 
topology encoded in the moments of degree distribution (k), (k 2 ), etc. As already been observed in the 
case of Markovian diffusion case™, the heterogeneity of the network favors the global spread of the 
epidemic by lowering the threshold value. Indeed, for heavy-tailed degree distribution P(k) ~ A;" 7 with 
7 > 1, the nth moment scales as & m ti -7 if n > 7 — 1 and k max 3> A; mm . This means that for n > 7 — 1, 
the nth moment of the degree distribution tends to diverge in the infinite size limit of the network, as in 
this limit k max — > 00, virtually reducing the threshold to zero. Even at finite size, however, the threshold 
value is generally smaller the higher the network heterogeneity is. In order to make this last statement 
transparent, we will turn our attention to the scaling of the moments of degree distribution for very large 
system sizes. In the case of 1 < 7 < 2, the term (k) / (k 2 ) in the nominator of p c scales as 

~ k' 1 (2\~\ 



In the range 2 < 7 < 3, the scaling is 

(k) 



~A&£ • (22) 



(k 2 ) 

In all the other cases of 7 > 3, the term (k) / (k 2 ) has a finite value. That means that the nominator of p c 
is finite for any 7 > 1. Now lets turn our attention to the denominator of p c and analyze the scaling of 

F((k),{k 2 },{k 3 },(k 4 }). In the case of 1 < 7 < 2, F scales as 

F((k), (k 2 ), (fc 3 ), (fc 4 )) ~ (P) + ~ *3£ • (23) 

In the case of 2 < 7 < 3, second moment (k 2 ) in the denominator and higher moments in the numerator 
are dominant, leading to the scaling relation: 

F«fc>, (k 2 ), (k 3 ), (A; 4 )) ~ (fc3) + ( ^ ) 2 1 ) /2(fc2)1/2 ~ AW • (24) 
In the range 3 < 7 < 4, only the third moment (A: 3 ) in the numerator dominates, thus 

F^^^A^A^-ik 3 )-^ ■ (25) 
22 



Nature Physics (2011) | doi:10.1038/nphys1944 



Balcan & Vespignani 



In the range 4 < 7 < 5, only the fourth moment (k 4 ) in the numerator dominates, leading to 

F((k),(k 2 ),(k 3 )Ak 4 ))-(k 4 ) 1/2 ~4 5 a - x 7)/2 ■ (26) 

The above expressions state that for any heavy-tailed degree distribution with exponent 7 < 5,F((k), {k 2 ), (fc 3 ), (A: 4 )) 
tents to diverge in the limit of infinite network size, which in turn pushes the threshold value p c to zero. 
While, on the other hand, if 7 > 5 then F({k), (k 2 ), (k 3 ), (k 4 )) has a finite value. 

Computational model 

Inside each subpopulation we consider an SIR epidemic model 41 , in which each individual is classified 
by one of the discrete disease states at any point in time. The rate at which a susceptible person in subpop- 
ulation i acquires the infection, the so-called force of infection 41 Aj, is determined by interactions with 
infectious individuals. The force of infection Aj acting on each susceptible individual in subpopulation i 
has been assumed to follow the mass action principle 

\J t ) = p-^L , (27) 
v ; N*(t) ' v 

where j3 is the transmission rate of infection and I*(t)/N*(t) is the prevalence of infectious individuals 
in the subpopulation. Each person in the susceptible compartment (S) contracts the infection with proba- 
bility Xi(t)At and enters the infectious compartment (I), where At is the time interval considered. Each 
infectious individual permanently recovers with probability fiAt, entering the recovered compartment 
(R). 

Synthetic subpopulation networks. 

Generation of substrate networks. In order to compare with theoretical calculations, topologically un- 
corrected random graphs have been considered. In this case, analytical calculations show that epidemic 
invasion threshold only depends on the degree distribution of the subpopulation networks. In order to 
verify this result, two different network topologies have been generated: 

• Erdos-Renyi graphs^ have been synthetized by assigning a link between each pair of nodes with 
probability (k)/(V — 1), where V is the number of nodes and (k) is a prescribed average node 
degree. 
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• Networks with power-law degree distribution, P(k) ~ k" 1 with fc m ; n < k < k m & x , have been 
generated by uncorrelated configuration model^^EE] All the scale-free networks have been gener- 
ated by setting 7 = 2.1 and k mm = 2. 

For the sake of comparison, the average degree of Erdos-Renyi graphs has been set to that of scale-free 
networks. 

Subpopulation sizes. From a pool of NV people, a population size N{ is assigned to each subpopulation i , 
defining its permanent residents. The population size is chosen at random from a multinomial distribution 
with probability proportional to k{, which ensures the metapopulation system to obey = Nk/ (k). 
Mobility parameters. The rate aij at which a resident of subpopulation i commutes to a neighboring 
subpopulation j G v(i) assumes 

an = a . (28) 

Each resident in subpopulation i leaves its origin and visits subpopulation j with probability cr^ At. A 
commuter in subpopulation j returns back to its resident subpopulation i with probability r^ 1 At. 

Simulations have been initialized with 1(0) = 10 infectious individuals, seeded randomly in a single 
subpopulation of degree /c m ; n , while the rest of the population is assumed to be susceptible to infection. 
Real- world subpopulation networks. 

Realistic simulations have been performed using the county to country commuting network in the con- 
tinental United States^. The network consists of about 3, 100 nodes, each of which corresponds to a 
US county. Weighted link from node i to node j represents the daily number of commuters from county 
i to county j. Thus, the population size of each node and the commuting rates among them are fully 
determined by the data. The return rate r , however, has been set to t _1 = 3 day^ 1 corresponding to 
a regular working day (8 hours). Simulations have been initialized with 1(0) = 10 infectious individuals 
seeded in Los Angeles County, California. 
Statistical analysis. 

Since we aim at determining the epidemic invasion threshold, we have let the metapopulation system 
run until the infection dies out. In the numerical results presented in main paper, all the realizations 
resulting in at least one diseased subpopulation have contributed to the statistical analysis. For each set 
of parameters, we have generated at least 1 , 000 system realizations. Since the subpopulation networks 
and dynamical processes on them are subject to fluctuations in the case of synthetic populations, we have 
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sampled at least 10 — 20 network realizations and 100 — 200 dynamical realizations on each of them. 
While in the case of the real- world scenarios, we have generated at least 1, 000 dynamical realizations. 
Contagion and mobility dynamics. 

We will follow the notations defined in main paper and represent each individual by its disease state X, 
its permanent subpopulation % and its present subpopulation j G v(i). Since all the individuals sharing 
the same three indices (X, i, j) are identical in terms of the dynamical processes, we are going refer to 
the number of such individuals at time t by Xij(t). Then, by definition, the instantaneous compartment 
size Xj (t) in subpopulation j can be expressed as 

X*(t) = X jj (t)+ £ X tj (t) , (29) 
eev(j) 

and the total number of individuals as N* = J2x X*. The number of individuals in each compartment X 
with a residence in % and present in j is subject to discrete and stochastic dynamical processes defined by 
disease and transport operators. The disease operator Vj represents the change due to the compartment 
transition induced by the infection dynamics, and the transport operator fix represents the variation due 
to mobility. 

The term Vj can be written as a combination of a set of transitions {Vj(X,Y)}, where Vj(X,Y) 
represents the number of transitions from compartment X to Y and is simulated as an integer random 
number extracted from a multinomial distribution. Then the change due to infection dynamics reads as 

V j (X) = Y,[V j (Y,X)-V j (X,Y)] . (30) 

Y 

As a concrete example let us consider the temporal change in the infectious compartment. There is only 
one possible transition from the compartment, that is to the recovered compartment. The number of 
transitions is extracted from the binomial distribution 

Pr Binom (/, J (t), Wij _^.) , (31) 

determined by the transition probability 

Phj^Rij = , (32) 

and the number of individuals in the compartment hj{t) (its size). This transition causes a reduction 
in the size of the compartment. The increase in the compartment size is due to the transitions from 
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susceptible into infectious compartment. This is also a random number extracted from the binomial 
distribution 

Pi^GMtJ.w^) > (33) 

given by the chance of contagion 

PSy-Wy = ^(t)At , (34) 

and the number of attempts equal to the number of susceptibles Sij(t). After extracting these numbers 
from the appropriate distributions, we can calculate the total change Vj (I) in infectious compartment as 

Vj{I)=Vj{S,I)-Vj{I,R) . (35) 



Transport operator U x expresses the total change in compartment sizes due to the commuting of perma- 
nent residents of subpopulation i back and forth. The variation in Xy can be decomposed into fl x (i, j) 
and fl x (j, i) as 

n x = n^(i,j)-n$(j,i) . (36) 

The first term U x (i,j) represents an increase that is caused by the departing residents of subpopulation 
i to visit subpopulation j. The Q x (i, j) is a random number extracted from the multinomial distribution 



p r Multmom (x .. W; {pXii ^ x J £ G v{i)} ) , (37) 

determined by the probability of commuting to subpopulation j 

PXu^Xa = (JijAt , (38) 

and the number of such trails Xa(t). The second term Q x {j, i) corresponds to a reduction in and is 
due to the return trips from subpopulation j to permanent subpopulation i. The H x {ji *) i s a l so a random 
number extracted from the binomial distribution 

Pv Bi ^(X tl (t), Px ^ Xii ) , (39) 

given by the probability of returning home 

Px ij ^x u =r- 1 At , (40) 
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and the size of the compartment Xij(t). 

We have assumed that the infection does not alter people's behavior, i.e., all the compartments are iden- 
tical in their mobility. 
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